{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ntGNDuSCeAR2"
   },
   "source": [
    "# FastEmbed on GPU\n",
    "\n",
    "As of version 0.2.7 FastEmbed supports GPU acceleration.\n",
    "\n",
    "This notebook covers the installation process and usage of fastembed on GPU.\n",
    "\n",
    "## Installation\n",
    "\n",
    "Fastembed depends on `onnxruntime` and inherits its scheme of GPU support.\n",
    "\n",
    "In order to use GPU with onnx models, you would need to have `onnxruntime-gpu` package, which substitutes all the `onnxruntime` functionality.\n",
    "Fastembed mimics this behavior and requires `fastembed-gpu` package to be installed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "GK2XADwUeEK7"
   },
   "outputs": [],
   "source": [
    "!pip install fastembed-gpu"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3aiGPqjCeGzo"
   },
   "source": [
    "**NOTE**: `onnxruntime-gpu` and `onnxruntime` can't be installed in the same environment. If you have `onnxruntime` installed, you would need to uninstall it before installing `onnxruntime-gpu`. Same is true for `fastembed` and `fastembed-gpu`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3xx3r-9jgAMi"
   },
   "source": [
    "### CUDA 12.x support\n",
    "You can check your CUDA version using such commands as `nvidia-smi` or `nvcc --version`\n",
    "\n",
    "Starting from version 1.19.0, onnxruntime-gpu ships with support for CUDA 12.x by default.\n",
    "\n",
    "Google Colab notebooks have by default CUDA 12.x and CuDNN 8.x.\n",
    "\n",
    "Latest version of `onnxruntime-gpu` requires CuDNN 9.x, in order to install it you can run the following command: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!sudo apt install cudnn9\n",
    "!pip install fastembed-gpu -qqq"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If it necessary to work with CuDNN 8, you can consider locking `onnxruntime-gpu` to 1.18.0 with CUDA 12.x by this command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install onnxruntime-gpu==1.18.0 -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ -qq\n",
    "!pip install fastembed-gpu -qqq"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### CUDA 11.x support\n",
    "To use latest version of `onnxruntime-gpu` with CUDA 11.x, you can run the following command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install onnxruntime-gpu -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/pypi/simple/ -qq"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**NOTE**: Ensure that CuDNN 9.x is installed when working with the latest `onnxruntime-gpu`, whether using CUDA 11.x or 12.x."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Igv5RXhSeO68"
   },
   "source": [
    "### CUDA drivers\n",
    "\n",
    "FastEmbed does not include CUDA drivers and CuDNN libraries.\n",
    "You would need to take care of the environment setup on your own.\n",
    "The dependencies required for the chosen onnxruntime version are listed in the [CUDA Execution Provider requirements](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setting up fastembed-gpu on GCP\n",
    "\n",
    "#### CUDA drivers\n",
    "[CUDA 11.8 toolkit](https://developer.nvidia.com/cuda-11-8-0-download-archive) or [CUDA 12.x toolkit](https://developer.nvidia.com/cuda-downloads) has to be installed if they haven't yet been set up.\n",
    "\n",
    "#### Example of setting up CUDA 12.x on Ubuntu 22.04\n",
    "Make sure to download an archive which has been created for your particular platform, CPU architecture and OS distribution.\n",
    "\n",
    "For Ubuntu 22.04 with x86_64 CPU architecture the following [archive](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network) has to be downloaded.\n",
    "\n",
    "```bash\n",
    "wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb\n",
    "sudo dpkg -i cuda-keyring_1.1-1_all.deb\n",
    "sudo apt-get update\n",
    "sudo apt-get -y install cuda\n",
    "```\n",
    "**NOTE**: Specific CUDA libraries can be found in the [meta packages section](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#meta-packages) in the CUDA installation guide.\n",
    "\n",
    "**NOTE**: When installing CUDA, the environment variable might not be set by default. Make sure to add the following line to your environment variables:\n",
    "```bash\n",
    "LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH\n",
    "```\n",
    "This will ensure that the CUDA libraries are properly linked.\n",
    "\n",
    "#### CuDNN 9.x\n",
    "CuDNN 9.x library can be installed via the following [archive](https://developer.nvidia.com/rdp/cudnn-archive).\n",
    "\n",
    "#### Example of setting up CuDNN 9.x on Ubuntu 22.04\n",
    "CuDNN 9.x for Ubuntu 22.04 x86_64 [archive](https://developer.nvidia.com/cudnn-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network) can be downloaded and installed in the following way:\n",
    "```bash\n",
    "wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb\n",
    "sudo dpkg -i cuda-keyring_1.1-1_all.deb\n",
    "sudo apt-get update\n",
    "sudo apt-get -y install cudnn\n",
    "```\n",
    "**NOTE**: When installing CuDNN, you can choose specific version, cudnn-cuda-11 or cudnn-cuda-12"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Common issues\n",
    "\n",
    "The following are some common issues that may arise while using `fastembed-gpu` if not installed properly:\n",
    "\n",
    "CUDA library is not installed:\n",
    "```bash\n",
    "FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.x: cannot open shared object file: No such file or directory\n",
    "```\n",
    "\n",
    "\n",
    "CuDNN library is not installed:\n",
    "```bash\n",
    "FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.x: cannot open shared object file: No such file or directory\n",
    "```\n",
    "\n",
    "\n",
    "CUDA library path is not set:\n",
    "```bash\n",
    "FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcufft.so.x: failed to map segment from shared object\n",
    "```\n",
    "\n",
    "Make sure to add the following line to your environment variables:\n",
    "```bash\n",
    "LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Usage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 334,
     "referenced_widgets": [
      "aacf08a7aa444b64a2efad1967d28a53",
      "5606aa785de74d65a9928b31c0be8a53",
      "d4ec9d3b74ec4412894da2161ed2bddf",
      "8edd544c3e074ec1813e5b9d1aef43d9",
      "9898890f8a75468ea20e3ce319d0b6e2",
      "da3b18abb16241a0a7191ee9afcb0510",
      "258a619168824253a6a329efdc51ebe6",
      "53c7cdc967d24faba0b5c659c94c50b8",
      "e9348d8be28d408e8e760c71b21ab294",
      "0ba06e0816714f2fbdec8260f160abc0",
      "0b96563334964d449dd34f35b6b3e715",
      "11c2eec490e8479b944eec7f30cb1ca2",
      "91463da0d1c5466795e06ab586002259",
      "30f4f7833406474f89ef0700b00a33aa",
      "4302c304ec6a4b5985797e300bd7e353",
      "2605640c7b824ed7aa137d404e14b774",
      "b02efe3a33d04f06aa8938719ab35671",
      "50408e5d052343b1a1b44a0fae0f801d",
      "1be01c95d9e84f8ea88367c987a72fdc",
      "a109c13bc93a449186424542dc330be8",
      "4adce304ce1947b5a01dde10bbb3bb8c",
      "a761366a37e44837a25e0f25b18efed2",
      "94512b9055e546389471197b76ad5449",
      "072dca00bd7b4918a178f90ccabf698a",
      "48a856c59ef74cc3834521b1bf616541",
      "c020c503aeaa464cad643ade5ee3ae24",
      "a4e7e40c0bbd4f878c20a9f65fe3a048",
      "e8c0a1c339fd47668d944a9defad79d4",
      "cd782d35c6bd40c0a60d57b1828a7251",
      "04f638ab08da4d20928644c4ba03f8ef",
      "17f20477fc79475f97adf1c1f64a4192",
      "96f7b5a2e224462e9fcffd03f906a593",
      "755cd32d9fc9407c80a160f45c802d1e",
      "a886258e7cd14c048b58391d7b772901",
      "bc3e48f826a74840867a6209e622b75e",
      "125b2ac0f78043bba7eca53474ca44c4",
      "82f186d1ffb4435d94a6c7e9025242ef",
      "77000333e5ca4094be291ad82d4a627a",
      "7fe64fb53055431488d002c76c8e331e",
      "7de59ae9919f4a5bb2b6e601a3c02412",
      "97a69423a6644eab87fc636e182f23a4",
      "4df936d1065b41f4bf02ed394fdf7b7e",
      "3918bd1affa3454e8e9044a418a056ea",
      "163b27ae0bce41e5b48efcb4b3fd780d",
      "94631fd6e0744085bc79c3121de4a9f7",
      "31cd98d66bc54418b35e70fbbc0fa3c0",
      "6d21627a638b4ddca6fe7bfb80a621b5",
      "b37bed9dc4fe45c08b8397288fe5b1a9",
      "164fef95d1414177a40d563f5682f6a3",
      "1a9a0ea53448413a8e4b360b7bb69e26",
      "dd1a4483b4b045c6929e3d2cf1338f63",
      "496ddd8e05f949cd8cbba8e677f476ac",
      "2813be951d7f48b2aad1dd4a444ce3eb",
      "8e9a2c2dd21942edbdfecb3b7dffc70b",
      "08a10fe247f1425db044cfc13f2fb384",
      "b8786aded92d421592bc7623c5c7899e",
      "c91a20a9433d4016ba2db69fa50e0b4d",
      "e997820738594c6dadb061908d7afdc1",
      "a5fc751f81ae498f9aa55ece0e6853b2",
      "2aee4fc8cda64c5eb8722be81e48e0ca",
      "3a53e8624dff48b3959875ef58ee99ce",
      "50a70044f77542108fe188598e70797e",
      "13cf998b35ae4507a63e797f6fa3eada",
      "6209eb6a68cf4a378767ef34d0d9216d",
      "7395db766b944af9b41d6b56c9ada0b1",
      "42122c317ec648688f0164a1adb5df28"
     ]
    },
    "id": "Ttf4YggPeQQK",
    "outputId": "aa75129d-9e2d-4c88-cf03-251dd43a11b1"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: \n",
      "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
      "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
      "You will be able to reuse this secret in all of your notebooks.\n",
      "Please note that authentication is recommended but still optional to access public models or datasets.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "aacf08a7aa444b64a2efad1967d28a53",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "11c2eec490e8479b944eec7f30cb1ca2",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "94512b9055e546389471197b76ad5449",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "config.json:   0%|          | 0.00/706 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a886258e7cd14c048b58391d7b772901",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "94631fd6e0744085bc79c3121de4a9f7",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b8786aded92d421592bc7623c5c7899e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "model_optimized.onnx:   0%|          | 0.00/66.5M [00:00<?, ?B/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "['CUDAExecutionProvider', 'CPUExecutionProvider']"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "from fastembed import TextEmbedding\n",
    "\n",
    "embedding_model_gpu = TextEmbedding(\n",
    "    model_name=\"BAAI/bge-small-en-v1.5\", providers=[\"CUDAExecutionProvider\"]\n",
    ")\n",
    "embedding_model_gpu.model.model.get_providers()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "iPtoHf7GeV-i"
   },
   "outputs": [],
   "source": "documents: list[str] = list(np.repeat(\"Demonstrating GPU acceleration in fastembed\", 500))"
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "islhyLf4ed-H",
    "outputId": "8c8ed09b-9eac-438f-97bc-578751975148"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "43.4 ms ± 2.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "list(embedding_model_gpu.embed(documents))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 67,
     "referenced_widgets": [
      "9c306ce5188c45feb8dfb9089592591c",
      "296ff54c6e61441f978084df59626598",
      "d6d42b4f245a49b7ba7769e23a3202fc",
      "39ce7754480147759c16a3089d8105af",
      "8253960a069d4106863a75faae54b90d",
      "7ccf959452af4c0b873c7567747f0816",
      "ac9d0b5a5b1f401e90a1cc9ffe6d4b4c",
      "0aada067dec3472f9aba1772d6b775a5",
      "07597b1287e04653b80c47a771549376",
      "054be1dd9f084cae911745b692ccd929",
      "ab19e8e831694e308a4b79f05aff728e"
     ]
    },
    "id": "bOKVUvWJegYJ",
    "outputId": "dde74917-08b0-4ce2-9a2b-cc31e02cafb2"
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "9c306ce5188c45feb8dfb9089592591c",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "['CPUExecutionProvider']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "embedding_model_cpu = TextEmbedding(model_name=\"BAAI/bge-small-en-v1.5\")\n",
    "embedding_model_cpu.model.model.get_providers()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "0NJj9RvSfASP",
    "outputId": "526f5280-99bd-454e-8af8-6a860ad96e54"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4.33 s ± 591 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit\n",
    "list(embedding_model_cpu.embed(documents))"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
